Peer-review in a world with rational scientists: Toward selection of the average 



Stefan Thurner 1 ' 2 ' 3 -* and Rudolf Hanel 1 ; 

1 Section of Science of Complex Systems, Medical University of Vienna, Spitalgasse 23, A-1090, Austria 
2 Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA 
3 IIASA, Schlossplatz 1, A-2361 Laxenburg, Austri^\ 

One of the virtues of peer review is that it provides a self-regulating selection mechanism for 
scientific work, papers and projects. Peer review as a selection mechanism is hard to evaluate in 
terms of its efficiency. Serious efforts to understand its strengths and weaknesses have not yet lead 
to clear answers. In theory peer review works if the involved parties (editors and referees) conform 
to a set of requirements, such as love for high quality science, objectiveness, and absence of biases, 
nepotism, friend and clique networks, selfishness, etc. If these requirements are violated, what is 
the effect on the selection of high quality work? We study this question with a simple agent based 
model. In particular we are interested in the effects of rational referees, who might not have any 
incentive to see high quality work other than their own published or promoted. We find that a small 
fraction of incorrect (selfish or rational) referees can drastically reduce the quality of the published 
(accepted) scientific standard. We quantify the fraction for which peer review will no longer select 
better than pure chance. Decline of quality of accepted scientific work is shown as a function of the 
fraction of rational and unqualified referees. We show how a simple quality-increasing policy of e.g. 
a journal can lead to a loss in overall scientific quality, and how mutual support-networks of authors 
and referees deteriorate the system. 
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I. INTRODUCTION 

"Dear Sir, We had sent you our manuscript for publica- 
tion and had not authorized you to show it to specialists 
before it is printed..." was A. Einstein's reply to the edi- 
tor of Physical Review when he received the rejection of 
a 1936 paper with N. Rosen To Einstein peer review 
was not a familiar system at the time. Quality selection 
of scientific work was dominated by autonomous deci- 
sions of editors of unquestionable authority (e.g. Max 
Plank for Annalen der Physik) or - for the more senior 
scientists - by academy memberships. Since these days 
scientific publishing has changed to a practically all dom- 
inating peer review system, worldwide and across disci- 
plines. Today most journals employ single-blind peer re- 
view. This transition to peer review set in after WWII, 
partly as a response to the increase of scientific research 
at that time. 

Peer review is a system that subjects scientific work 
to scrutiny of experts in the field. If the standards of 
scientific rigor, technical correctness, novelty and the cri- 
terion of sufficient interest are approved by usually 2-3 
peers, the work gets published. In this way the authority 
of science - represented by a team of experts of a par- 
ticular community - self-regulates its quality standards. 
Ideally, these standards should be kept as high as pos- 
sible. The peer review system is largely perceived as to 
meet this aim, as one of the bestpossible ways, with - 
at present - little alternatives 0, 

Peer review suffers from well known problems 
maybe the most impressive one being that its efficiency 
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is largely unknown and practically untested [5]. The 
Cochrane collaboration has not found evidence that peer 
review works, nor that it does not [6j. In an experiment 
peer review has been tested for its ability to detect er- 
rors in scientific papers. A paper with 8 deliberate errors 
was sent to 420 reviewers. The average number of de- 
tected errors was 2, nobody spotted 5 or more errors, 
and 16% did not spot a single error Q. Further, peer 
review leaves room for bias on both the reviewer's and 
editor's side. Evidence for existence of nationality, lan- 
guage, speciality, reputation and gender biases have been 
reported US]. 

A fundamental problem of the peer review process is 
that it introduces conflicting interests or moral hazard 
problems in a variety of situations. By accepting high 
quality work and thus promoting it, the referee risks to 
draw the attention to these ideas and possibly away from 
her own. A post-doc looking for his next position is 
maybe not happy to accept a good paper of his peer who 
competes for the same position. A big-shot in a particu- 
lar field might fear to risk his 'guru status' by accepting 
challenging and maybe better ideas than his own, etc. In 
other words, referees who optimize their overall 'utility' 
(status, papers, fame, position, ...) might find that ac- 
cepting good scientific work of others is in direct conflict 
with their own utility maximization. In the following we 
call utility optimizing referees rational. They avoid to 
accept better work than their own. 

It is clear that in the presence of referees with con- 
flicting interests the quality selection aspect of the peer 
review system will work sub-optimally. Here we present 
a simple agent based model to assess the robustness of 
the quality selection component of peer review under the 
presence of rational referees. The choice of an agent 
based model is motivated by the absence of alternatives 
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to assess efficiency. It is possible to demonstrate that 
the concept of peer review works fine as long as rational 
referees are absent. However, the system is extremely 
sensitive to perturbations. The model allows to illus- 
trate the impact of rational behavior, of friendship net- 
works (nepotism), and counter- intuitive effects, e.g. from 
quality-increasing management policies, implemented by 
journals or funding agencies. 

Here we solely focus on the quality selection aspect of 
peer review and ignore its other potential benefits such 
as improvement of papers through spell checking, error 
detection, completing references, cost efficiency, etc. Also 
we will not discuss its inadequacy to select novel ideas 
[Tol [ill, nor its experimentally confirmed conformatory 
bias [l2l.[l3T|. From the quality selection perspective only, 
we find that unless the fractions of rational and unreliable 
referees are kept well below 30%, peer review will not 
perform better than pure chance, e.g. by throwing a coin. 
It is interesting to see this finding in the light of recently 
proposed semi-random selection process which has been 
shown to have a economic value-adding component in the 
context of R&D projects (TbT |. 

II. MODEL 

We consider a scientific community of N productive 
scientists. Each scientist produces one paper every 2 
time-units t (e.g. one paper every 2 weeks, years or 
decades). We make two assumptions about the authors: 

(1) The scientific quality of authors follows a Gaus- 
sian distribution, i.e. each author i is assigned an 
'IQ' index, Qf uth ° r , drawn from a normal distribution 
gauthor g j\r(100, author)- We consider mature scien- 
tists, i.e. Q^ uthor is constant over time (no learning or 
intellectual decline). We set Cauthor = 10. 

(2) Author i produces papers which vary in quality, 
fluctuating around his average quality Qf uthor . We 
assume these fluctuations to be Gaussian with vari- 
ance crq Ua ii ty . The quality of a paper submitted by 
% at timestep t is Qf ubmit (i) 6 A/"(Q? uthor ,<^ uaUty ). 
For simplicity cr qua iit y = 5 is the same for all au- 
thors. The distribution of all submitted papers thus 
is iV(100, Author + Quality)- The essence of the review 
process is now to select the good papers from these 
submissions. 

At every timestep each submitted paper is sent to 2 
independent referees, randomly chosen from the ./V sci- 
entists. The possibility of self-review is excluded. Each 
reviewer produces a binary recommendation within the 
same timestep: 'accept' or 'reject'. If a paper gets 2 'ac- 
cept' it is accepted, if it gets 2 'reject', it is rejected, if 
there is a tie (1 'accept' and 1 'reject') it gets accepted 
with a probability of 0.5. Upon acceptance a submitted 
paper becomes an accepted paper, the quality of which 
is now denoted by Q^ cccpt (t). We assume each scientist, 



when acting as a referee, to belong to one of four referee 
types: 

• The correct: Accepts good and rejects bad pa- 
pers. A paper from author j is considered good 
by referee i, if its quality is above a minimum re- 
quirement Q mm . This minimum requirement can 
be modeled in various ways. Here we say that 
the minimum paper quality - set e.g. by a jour- 
nal - is that it lies within the top q-quantile of 
recently published papers in the field. To com- 
pute this quantile we first use a simple exponential 
moving average to compute the average: M(t) = 
AM(t-l)+(l-A)(Q t accept (t-l))i, with A = 0.1. 
indicates the average over all accepted papers. The 
top g-quantile is now everything above this mov- 
ing average plus a times one standard deviation 
of the distribution of recently published papers, 
Q min (t) = M(t) + astd[Q* cccpt (f - 1)]. In other 
words, referee i accepts a paper if Qf ubmlt (i) > 
Q min (t). 

• The stupid (random): This referee can not judge 
the quality of a paper (e.g. because of incompe- 
tence or lack of time) and takes a random decision 
on a paper. 

• The rational: The rational referee knows that 
work better than his/her own might draw atten- 
tion away from his/her own work. For him there is 
no incentive to accept anything better than one's 
own work, while it might be fine to accept worse 
quality. Referee % rejects papers from authors j 
whenever they are above his/her quality index, 
gauthor < gsubmit^ j accepts papers when they 
are between a minimum quality threshold and his 
own index, i.e. Qf hmit (t) <= [Q™ bmit , Qf utho1 ]. For 
simplicity we set Q™^ mt = 90 constant. 

• The altruist: Accepts all papers. The referee 
might simply not care or could fear that his identity 
might get disclosed through e.g. editorial mistakes. 

• The misanthropist: Rejects all papers. 

Recommendations of the altruist and misanthropist 
type affect the total number of accepted papers but do 
not influence the quality selection process. We therefore 
neglect them and remain with three types of referees: 
the correct, the random and the rational, their respective 
fractions being f c , / rnd and / r , with f c + / rnd + / r = 1. 

To model effects of mutually favoring friendship net- 
works where friends (or members of some group such as 
co-authors) accept each other's work regardless of qual- 
ity, we introduce a variant of the model. We randomly 
chose L scientists who belong to a (say co-authorship) 
network within the community. N — L authors do not 
belong the network. Regardless of referee type, if the au- 
thor of a submitted paper i and the referee j both belong 
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FIG. 1: Average quality of accepted papers at each timestep 
for a community of correct referees (a) and one composed of 
10% rational vs. 90 % correct referees (c). In (b) and (d) the 
histograms of the respective timeseries (a) and (c) are shown 
(red). For reference we indicate the quality distribution of 
submitted papers (blue line in (b) and (d)). Note, that the 
average quality of submitted papers is 100. The average paper 
quality with 10% rational referees drops by about 10, i.e. one 
standard deviation of the submitted paper distribution. 



to the same group, the paper is automatically accepted. 
The relevant parameter is relative group size, n = L/N. 

The role of journals or funding agencies we model by 
allowing them to ask referees to accept papers only if 
they fall within the top g-quantile. According to the 
previous notation, journals can specify their a. Unless 
stated explicitly, we set a = 0, i.e. referees are asked to 
accept above-average papers. Note, that only the correct 
referees will comply with this requirement. 



III. RESULTS 

We implement the above model in a computer simula- 
tion for N — 1000 scientists and 500 timesteps. We first 
set a = 0, i.e. referees arc asked to accept above average 
articles only. 

In Fig. Q]we show the time evolution of the average ac- 
cepted quality of papers at each timestep. In (a) the situ- 
ation is shown for a community of 100% correct referees. 
After a short transition time the average paper quality 
of accepted papers, rises above 120, i.e. only top quality 
papers are selected. In Fig. [1] (b) we show the histogram 
of the data points of Fig. [1] (a) , i.e. accepted paper qual- 
ity (red) over 500 timesteps. It is clear that the average 
of the accepted papers is about 2 standard deviations 
above the mean of the submitted papers (100). The ref- 
eree process works at its best, only the very best papers 
are published. In Figs. [T] (c) and (d) we show the same 
as above, now for a community composed of 10% rational 
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FIG. 2: Average quality of accepted papers as a function of 
the abundance of rational referees. The three different scenar- 
ios shown correspond to three fractions of random referees, 
fmd = 0,0.1 and 0.2. It is obvious that even small fractions 
of rational referees bring down the system to select papers of 
close to average quality, a = n = 0. 



and 90% correct referees. It is immediately clear that the 
average quality of the accepted papers drops drastically 
- the same refereeing process with a small fraction of ra- 
tional referees (10%) brings the quality down by almost 
one standard deviation of submitted article quality. 

The decline of average accepted paper quality as a 
function of the fraction of rational referees in the com- 
munity is shown in Fig. [5J For fractions of over 70% of 
rational referees (an admittedly dark scenario) the selec- 
tion mechanism in the refereeing process completely van- 
ishes. The average quality of selected work falls within 
a narrow band of average quality (100) and peer review 
process turnes absurd. Fig. [2] further shows the situation 
for two fixed fractions of random referees, fmd = 0.1 and 
0.2. Clearly, the addition of random referees brings the 
accepted paper quality down in a dramatic way. Quanti- 
tatively, adding about 10% of random referees has prac- 
tically the same effect as adding 10% of rational referees, 
see Fig. fJJ 

In Fig. [3] we show the results from a simulation with 
a friendship network present, whose members accept all 
papers of authors belonging to the same network, regard- 
less of paper quality or referee type. We set n = 0.1, i.e. 
10% of the authors are part of such a network. Again 
we plot the scenario as a function of abundance of ratio- 
nal referees, f rn d — 0, and a = 0. The figure compares 
the average quality of papers of authors belonging to the 
network (squares) , to those of non- members (circles) . As 
expected, the quality of members of the network is dras- 
tically lower than for the rest where the referee system 
is not corrupted. The average quality of accepted pa- 
pers in the total society drops from about 122 to 117 for 
f r = 0. The acceptance rate of authors belonging to the 
network is about a factor of 2 larger than for the correct 
group, irrespective of the fraction of rational referees. In 
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side the network is about 5%. In the survey 14] a rate 
of acceptance (within the first round) of about 8% was 
reported, placing the model results in the right ballpark. 

Finally, in Fig. |4]we show the effect of the journal ask- 
ing referees to raise the quality threshold, i.e. a > 0. The 
circles reflect the situation as in Fig. [2] (for f rn d = 0). 
Once referees are asked to raise their threshold from 
a = by one standard deviation to a = 1, quality drops 
(squares). The reason is easily understanable: while cor- 
rect referees, by raising their standards, accept less and 
less (good) papers, the bias toward bad papers of the ra- 
tional referees remains the same. The rational referees 
gain more 'weight' in the selection process with respect 
to the correct ones, and the quality deteriorates. 



FIG. 3: Situation with a subgroup of authors (n = 0.1) who 
accept all submissions from authors belonging to the same 
group. The average accepted paper quality of the authors 
from the group (squares) is compared to the one of those not 
belonging to the cartel (circles), again shown as a function 
of rational referees, / r . Obviously, when the refereeing sys- 
tem is surpassed by a 'friendship-bias' no quality selection is 
possible. f rn d = 0, a = 0. 
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FIG. 4: Effect of journals asking for higher paper quality 
standards. The cases a = (circles) and a — 1 (squares) are 
compared, which is a shift of one standard deviation. The 
later leads to a significant drop of average accepted paper 
quality. f rnd = n = 0. 



absolute terms the acceptance rate for the authors out- 



IV. CONCLUSION 

With a simple agent based model we have shown that 
the standard peer reviewing process is not a robust way 
for quality selection of scientific work. The presence of 
relatively small fractions of 'rational' and/or 'random' 
referees (deviating from correct behavior) considerably 
reduces the average quality of published or sponsored 
science as a whole. We quantified the effect of nepo- 
tism networks and demonstrated that under the presence 
of 'rational' referees quality-increasing strategies of jour- 
nals or funding agencies can lead to adverse effects on a 
systemic level. Our message is clear: if it can not be guar- 
anteed that the fraction of 'rational' and 'random' refer- 
ees is confined to a very small number, the peer review 
system will not perform much better than by accepting 
papers by throwing (an unbiased!) coin. For example 
if the fractions of rational, random and correct referees 
are approximately 1/3 each, the quality selection aspect 
of peer review practically vanishes. We think that it is 
important to try to think of ways to assess the actual 
numbers within the different scientific communities. If it 
turns out that e.g. / rn d and f r fall above 30% it would be 
necessary to re-think the prevailing system. Under these 
circumstances - which are not totally unrealistic for cer- 
tain communities - a purely random refereeing system 
would perform equally well - and would at the same time 
safe millions of man-hours spent on refereeing every year. 

We thank Elise S. Brezis for most helpful comments. 
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